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which  an  effective  theoretical  analysis  has  been  achieved. 


Approved  for  rvj'-.N-  ~ , 
_ P^amcuuca  Unlimited 


89 


01  025 


N'iC'os,ste"'s  Massacnjse’ti  Ca^b'idge 

Cc'i'.e'  l-ii-  :.i-s  Massachusetif 

Roc 39-321  cf  Tecnno'ogv  02189 


Telephone 
(67  7:.  253-8138 


Acknowledgements 

To  appear  in  the  A  CM  Symposium  on  Theory  of  Computing,  May  1989.  This 
research  was  supported  in  part  by  the  Defense  Advanced  Research  Projects 
Agency  under  contract  numbers  N00014-80-C-0622  and  N00014-87-K-0825,  the 
Air  Force  under  contract  number  AFOSR-86-0078,  and  the  Army  under  contract 
number  DAAL03-86-K-0171.  Leighton  was  supported  in  part  by  a  NSF 
Presidential  Young  Investigator  Award  with  matching  funds  from  IBM 
Corporation. 


Author  Information 

Hastad,  current  address:  Royal  Institute  of  Technology,  Stockholm,  Sweden. 
Leighton:  Department  of  Mathematics  and  Laboratory  for  Computer  Science, 
Room  NE43-332,  MIT,  Cambridge,  MA  02139.  (617)  253-5876. 
Newman:  Department  of  Mathematics  and  Laboratory  for  Computer  Science, 
Room  NE43-334,  MIT,  Cambridge,  MA  02139.  (617)  253-5866 


Copyright®  1989  MIT.  Memos  in  this  series  are  for  use  inside  MIT  and  are  not 
considered  to  be  published  merely  by  virtue  of  appearing  in  this  series.  This  copy 
is  for  private  circulation  only  and  may  not  be  further  copied  or  distributed,  except 
for  government  purposes,  if  the  paper  acknowledges  U.  S.  Government  sponsor¬ 
ship.  References  to  this  work  should  be  either  to  the  published  version,  if  any,  or 
in  the  form  “private  communication.”  For  information  about  the  ideas  expressed 
herein,  contact  the  author  directly.  For  information  about  this  series,  contact 
Microsystems  Research  Center,  Room  39-321,  MIT,  Cambridge,  MA  02139; 

(617)  253-8138. 


Fast  Computation  Using  Faulty  Hypercubes 

(extended  abstract) 

Johan  Hastad  *  Tom  Leighton  *  Mark  Newman  * 


Abstract 

We  consider  the  computational  power  of  a  hypercube 
containing  a  potentially  large  number  of  randomly  lo¬ 
cated  faulty  components.  We  describe  a  randomized 
algorithm  which  embeds  an  /V-node  hypercube  in  an  N- 
node  hypercube  with  faulty  processors.  Provided  that 
the  processors  of  the  Af-node  hypercube  are  faulty  with 
probability  p  <  1,  and  that  the  faults  ere  independently 
distributed,  we  show  that  with  high  probability,  the 
faulty  hypercube  can  emulate  the  fault-free  hypercube 
with  only  constant  slowdown.  In  other  words,  an  .V- 
node  hypercube  with  faults  can  simulate  T  steps  of  an 
jV-node  fault-free  hypercube  in  O(T)  steps.  The  embed¬ 
ding  is  easy  to  construct  in  polylcgarit'nrr.ic  time  using 
only  local  control.  We  also  describe  0(log  A')-step  rout¬ 
ing  algorithms  which  ensure  the  delivery  of  messages 
with  high  probability  even  when  a  constant  fraction  of 
the  nodes  and  edges  have  failed.  The  routing  results  rep¬ 
resent  the  first  adaptive  routing  algorithms  for  which  an 
effective  theoretical  analysis  has  been  achieved. 

1  Introduction 

The  hypercube  is  emerging  as  one  of  the  most  effective 
and  popular  network  architectures  for  large  scale  paral¬ 
lel  computers.  Already,  hypercube-based  machines  con¬ 
taining  216  processing  elements  have  been  manufactured 
and  sold  commercially,  and  it  is  possible  that  in  the 
not-too-distant  future,  hypercube-based  machines  con¬ 
taining  millions  of  processors  will  be  available.  A  main 
concern  with  the  development  of  such  large  scale  sys- 
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terns  is  fault  tolerance.  In  particular,  it  is  crucial  that 
such  a  system  be  able  to  withstand  an  accumulation  of 
faults  among  a  reasonable  number  of  its  components. 

In  this  paper,  we  investigate  the  tolerance  of  the  hy¬ 
percube  to  randomly  distributed  faults  Our  results  are 
very  encouraging  and,  in  some  areas,  striking  For  ex¬ 
ample,  provided  that  the  processors  of  the  .V-node  hy¬ 
percube  are  faulty  with  probability  pc  1,  and  that 
the  faults  are  independently  distributed,  we  show  that, 
with  high  probability,  the  faulty  hypercube  can  simu¬ 
late  a  fault-free  N-node  hypercube  with  only  constant 
slowdown.  Of  course,  the  constant  factor  slowdown  de¬ 
pends  on  p  (since  it  must  always  be  at  least  ^-k_),  but 
it  does  not  depend  on  N .  Moreover,  the  algorithm  for 
reconfiguring  the  faulty  hypercube  is  simple,  needs  only 
local  control  and  runs  in  only  a  polylogarithmic  number 
of  steps.  Hence  the  faulty  hypercube  can  quickly  recon¬ 
figure  itself  to  appear  fault- free  (except  for  the  constant 
slowdown)  without  the  intervention  of  a  central  control 

The  reconfiguration  algorithm  for  faulty  hypercubes 
described  in  this  paper  represents  a  significant  improve¬ 
ment  over  previous  work.  In  particular,  although  some 
known  reconfiguration  algorithms  preserve  locality  (i.e., 
neighbors  in  the  virtual  fault-free  hypercube  are  sim¬ 
ulated  by  nodes  within  constant  distance  of  one  an¬ 
other  in  the  faulty  hypercube),  the  best  previous  sim¬ 
ulation  algorithms  used  off-line  and  nonconstructive 
reconfiguration  strategies  to  obtain  slowdowns  of  size 
©(v'log  log  A7)  (see  [HLN]).  The  best  previous  on-line 
algorithms  result  in  slowdowns  of  size  0(v1og  Ar)  (also 
in  [HLN]).  The  main  problem  with  these  algorithms  is 
that  they  require  a  nonconstant  number  of  wires  in  the 
virtual  fault-free  hypercube  to  be  routed  across  a  sin¬ 
gle  wire  of  the  faulty  hypercube.  Hence,  a  constant 
time  simulation  using  these  algorithms  is  not  possible 
Otherwise,  there  has  been  relatively  little  previous  work 
on  reconfiguring  faulty  hypercubes  that  contain  more 
than  a  few  faults.  The  only  exceptions  (of  which  we  are 
aware)  are  the  work  of  Becker  and  Simon  ( [BS] )  and 
Graham,  Harary,  Livingston  and  Stout  ([GHLS]),  who 
consider  fault-free  subcubes  of  a  hypercube  containing 
worst  case  faults.  The  constraint  that  the  embedded 
cube  be  a  subcube  (i.e.  dilation  1)  is  very  restrictive,  as 
is  the  assumption  that  faults  are  located  in  a  worst  case 
fashion.  Hence,  the  techniques  and  results  of  [BS]  and 
[GHLS]  are  quite  different  from  those  presented  here 
Dolev,  Halpern,  Simons  and  Strong  ( [DHSS] )  also  con- 


sidered  faults  in  a  worst  case  model.  Routes  were  chosen 
from  a  predetermined  set  of  paths.  Their  techniques  and 
results  also  differ  markedly  from  ours. 

In  the  latter  sections  of  the  paper,  we  consider  the 
problem  of  packet  routing  on  a  hypercube  with  faults. 
Of  course,  once  we  have  reconfigured  a  faulty  hyper¬ 
cube,  we  can  simply  use  the  classical  algorithms  to  route 
any  permutation  of  N  packets  in  0(log  jV)  steps.  How¬ 
ever,  the  question  of  routing  in  a  fault  tolerant  fash¬ 
ion  directly  on  a  faulty  hypercube  is  still  of  some  inter¬ 
est.  For  example,  can  we  handle  faults  that  are  occur¬ 
ring  dynamically  without  having  to  reconfigure  all  the 
time?  Also,  are  there  efficient  routing  algorithms  that 
are  adaptive  in  the  sense  that  packets  can  be  routed  lo¬ 
cally  around  faults  without  knowing  in  advance  where 
the  faulty  components  lie?  Affirmative  answers  to  these 
questions  could  well  be  of  interest  in  practice  and  in  the¬ 
ory  since  all  known  0(log  IV)  step  routing  algorithms  on 
a  hypercube  are  inherently  non-adaptive. 

Motivated  by  such  issues,  we  describe  and  analyze 
a  randomized  packet  routing  algorithm  that  adaptively 
routes  packets  around  faults  as  they  are  encountered 
in  an  ,V-node  hypercube  that  contains  Q(N)  randomly 
located  faulty  nodes  and  Q(N  log  A')  randomly  located 
faulty  edges.  We  prove  that  the  algorithm  routes  any 
permutation  on  the  working  processors  in  O(logAf) 
steps  with  high  probability.  Packets  that  start  or  end 
at  faulty  nodes  are  eventually  determined  to  be  unde¬ 
liverable.  All  the  deliverable  packets  arrive  at  their  des¬ 
tinations  provided  that  they  are  not  located  in  the  im¬ 
mediate  vicinity  of  a  processor  at  the  moment  when  it 
fails.  The  algorithm  is  fault-tolerant  in  the  sense  that 
no  advance  knowledge  of  the  locations  of  the  faults  is 
needed  for  the  path  selection,  but  it  is  not  completely 
tolerant  of  nodes  which  fail  while  holding  packets.  The 
algorithm  is  of  interest  because  during  most  steps,  few 
processors  will  fail  and  almost  all  deliverable  packets 
will  be  delivered.  In  addition,  the  algorithm  itself  is 
quite  simple  and  is  the  first  adaptive  routing  algorithm 
for  which  an  O(log.V)  bound  on  the  routing  time  has 
been  achieved. 

There  has  been  some  previous  work  on  packet  rout¬ 
ing  in  faulty  hypercubes.  Most  notably,  Rabin  ([R])  has 
devised  an  elegant  scheme  wherein  each  packet  to  be 
routed  is  decomposed  into  ©(log  TV)  pieces.  The  pieces 
are  routed  in  a  randomized  nonadaptive  fashion  to  their 
destinations  and  then  recombined  to  form  the  original 
message.  A  key  aspect  of  the  scheme  is  that  the  packet 
decomposition  uses  error-correcting  codes.  Therefore 
only  a  constant  fraction  of  the  pieces  of  any  packet  need 
to  get  through  to  the  destination  for  the  packet  to  be  re¬ 
constructed.  Such  a  scheme  can  be  useful  if  the  original 
packets  represent  relatively  long  bit  streams  In  partic¬ 
ular,  the  original  packets  must  have  length  f?(log2  jV). 
Additionally,  at  most  0( j^t)  edge  faults  can  be  ab¬ 
sorbed.  Under  these  conditions,  the  Rabin  algorithm 
provides  a  fully  fault-tolerant  routing  of  N  packets  in 


0(\ogN)  steps  with  high  probability. 

In  the  paper,  we  show  how  the  Rabin  result  can 
achieved  with  a  simpler  algorithm  and  analysis.  Ou^l 
analysis  permits  the  routing  to  absorb  up  to  0{S)  edge 
faults  as  well  as  O(jj^)  node  faults.  (A  similar  re¬ 
sult  with  a  more  complicated  algorithm  and  analysis 
has  recently  been  discovered  by  Giladi  ([G]).)  We  also 
briefly  sketch  a  way  to  potentially  improve  its  tolerance 
to  faults  in  as  many  as  a  constant  fraction  of  compo¬ 
nents  by  combining  the  decomposition  scheme  with  our 
adaptive  algorithm  for  routing  around  faults. 

Before  describing  our  results  further,  it  is  useful  to 
review  some  terminology.  First,  we  describe  the  n- 
dimensional,  N-node  hypercube  Hn ,  N  —  2".  The 
nodes  of  Hn  are  denoted  by  n-bit  binary  strings,  and 
two  nodes  are  linked  by  an  edge  if  the  associated  strings 
differ  in  precisely  one  bit.  If  the  differing  bit  is  in  the  ith 
position  (1  <  *  <  n)  then  the  associated  edge  is  called  a 
dimension  i  edge.  The  neighbor  of  a  node  v  across  the 


i,h  dimension  will  be  denoted  by  v'.  Similarly  “ 


will  denote  the  node  reached  from  v  by  traversing  di¬ 
mensions  »i ,  »2, . . . ,  it  ■  For  the  rest  of  this  paper  we  will 
use  n  and  logJV  interchangeably. 

An  embedding  of  a  virtual  fault-free  hypercube  H ^ 
into  a  faulty  hypercube  H„  is  a  map  <t>  :  H'n  —  H„  that 
maps  nodes  of  H'n  to  functioning  nodes  of  H„  and  edges 
of  H'n  to  functioning  paths  of  H„.  In  this  paper,  a  path^ 
is  said  to  be  functioning  if  all  the  nodes  and  edges  on  thfl 
path  are  fault-free.  (In  some  other  papers  ([HLN],  [A])^ 
there  are  models  which  allow  the  nodes  on  the  path  to 
be  faulty.)  The  dilation  of  an  embedding  is  the  length  of 
the  longest  path  <£(e)  in  Hn  corresponding  to  an  edge  in 
H'n .  The  load  of  an  embedding  is  the  maximum  number 
of  nodes  of  H'n  mapped  to  a  single  node  of  H„.  The 
congestion  of  an  embedding  is  the  maximum  number  of 
paths  <£(e)  that  use  a  single  edge  of  Hn.  Expressed  in 
these  terms,  our  task  is  to  find  an  embedding  of  H'n  in 
Hn  with  constant  dilation,  load  and  congestion.  Once 
this  is  done,  then  Hn  will  be  able  to  simulate  H'n  with 
constant  slowdown. 

In  this  paper,  we  will  consider  a  fault  model  where 
nodes  and  edges  f  7 ,  fail  independently  with  proba¬ 
bility  p  <  1.  For  St-  ">st  r  ‘  the  proofs,  we  will  focus  on  the 
case  where  p  <  .1  t  >nly  nodes  fail.  We  later  explain 
how  to  handle  larger  probabilities  of  failure  and  edge 
failures.  The  case  where  p  <  .1  is  easier  because  each 
node  will  then  have  a  reasonably  large  neighborhood  of 
functioning  processors  with  high  probability. 

In  reality,  it  is  unlikely  that  so  many  processors  will 
fail  in  a  short  period  of  time.  However,  our  results  are  all 
upper  bounds,  and  the  case  where  only  a  small  number 
of  components  fail  is  even  easier.  Also,  if  a  hypercube- 
based  machine  is  inaccessible  or  vulnerable  to  catastro-, 
phe  then  it  may  be  possible  that  a  large  number  of  faults* 
occur  over  time.  This  approach  can  also  be  applied  in  a 
hierarchical  fashion  if  the  faults  are  not  independently 
located.  For  example,  if  entire  subcubes  are  likely  to 


fail  as  a  unit  (e  g.  an  entire  chip,  board  or  box  fails) 
but  units  fail  independently,  then  the  same  results  ap¬ 
ply.  Even  if  a  few  locally  concentrated  faults  occur, 
our  results  apply  to  the  affected  region.  This  analysis 
cannot  apply  to  a  worst  case  allocation  of  faults,  since 
by  selectively  killing  0(jV)  nodes  we  can  disconnect  the 
hypercube  into  components  of  size  Q(  > ^  ■  ). 

\/\ogS 

Our  proofs  use  a  fair  amount  of  probabilistic  and  com¬ 
binatorial  analysis.  Although  no  individual  step  in  the 
analysis  is  particularly  difficult  or  noteworthy,  there  are 
some  approaches  to  these  problems  that  may  be  of  use 
with  other  hypercube-related  problems.  In  particular, 
there  is  one  simple  observation  that  is  used  in  various 
forms  throughout  the  paper.  Although  the  observation 
has  probably  been  made  by  others,  it  is  basic  enough 
that  we  think  it  worth  highlighting  as  a  paradigm  for 
distributed  match-making. 

We  will  describe  the  result  in  its  most  basic  form. 
Consider  a  collection  of  &(N)  men  and  O(N)  women 
at  a  dance.  Assume  that  each  man  has  at  least  fi(AT) 
female  friends  and  that  each  woman  has  at  most  O(X) 
male  friends.  By  Hall’s  marriage  theorem,  it  is  pos¬ 
sible  to  schedule  0(1)  rounds  of  dances  so  that  every 
man  dances  with  at  least  one  friend  and  every  woman 
dances  at  most  0(1)  times.  Unfortunately,  the  problem 
of  scheduling  dance  partners  requires  substantial  global 
coordination.  For  our  purposes,  we  focus  on  a  scenario 
where  pairing  is  accomplished  simply  by  a  man  asking  a 
woman  to  dance.  If  many  men  ask  a  woman  to  dance  at 
once,  she  accepts  as  many  as  she  can,  making  sure  not  to 
exceed  her  capacity  of  C  =  0(1)  dances  for  the  evening. 
If  she  can  only  accept  some  of  the  men,  she  prefers 
the  tallest  among  them.  Each  man  chooses  a  friend 
randomly  for  each  dance  (without  knowledge  of  which 
women  are  tired  or  which  women  other  men  are  asking) 
until  he  dances.  The  result  (which  we  call  the  Dance 
Hall  Thicrem  -  pun  intended)  is  that  if  X  =  6(log  N), 
and  there  are  f)(log7V)  dances,  then  with  high  probabil¬ 
ity  (i.e.  with  probability  exceeding  1  -  O(^)  for  some 
constant  c  >  1)  every  man  will  dance  during  the  course 
of  the  evening. 

The  Dance  Hall  Theorem  scenario  first  arises  in  our 
analysis  when  we  attempt  to  embed  the  nodes  of  H'n 
in  the  functioning  nodes  of  Hn  The  nodes  of  H'n  cor¬ 
respond  to  men  and  the  functioning  nodes  of  Hn  cor¬ 
respond  to  women.  If  a  man  dances  with  a  woman, 
then  the  corresponding  node  of  H'„  will  be  simulated 
by  the  corresponding  node  of  Hn.  We  need  the  Dance 
Hall  Theorem  to  ensure  that  the  load  of  the  embed¬ 
ding  is  0(1)  (i.e.  every  woman  dances  with  0(1)  men) 
and  to  ensure  that  the  embedding  can  be  constructed 
quickly  with  local  control  (no  global  matchmaker).  We 
also  need  some  other  as-yet-undescribed  properties  of 
the  Dance  Hall  Theorem  schedule  to  ensure  that  the  di¬ 
lation  and  congestion  of  the  embedding  are  0(1),  but 
these  are  more  technical  in  nature  and  will  be  dealt  with 
in  the  main  text. 


The  technical  portion  of  the  paper  is  divided  into 
three  sections.  In  section  2,  we  present  the  constant 
delay  embedding.  Section  3  contains  the  0(\ogN)  time 
adaptive  routing  algorithm  and  in  section  4  we  show  how 
to  improve  Rabin’s  fault-tolerant  results  with  a  simpler 
algorithm. 

2  Constant  Delay  Reconfigura¬ 
tion 

To  achieve  a  constant  delay  embedding,  we  need  the 
load,  dilation  and  congestion  to  all  be  constant.  The 
embedding  we  will  find  will  have  a  load  and  congestion 
which  depend  strongly  on  the  probability  of  failure  - 
clearly  the  more  nodes  that  fail,  the  more  nodes  that 
have  to  be  simulated  by  any  one  processor.  However, 
the  dilation  will  always  remain  five,  and  each  processor 
will  be  simulated  by  one  of  its  neighbors,  provided  that 
p  <  1  -  \/l>  (about  .16). 

In  order  to  simplify  the  analysis,  each  node  (live  or 
dead)  finds  a  neighbor  to  simulate  it.  We  first  assign 
nodes  to  live  neighbors  so  that  no  node  simulates  more 
than  a  constant  number  of  its  neighbors.  Then  each  pair 
of  nodes  simulating  neighbors  finds  a  live  path  between 
them  of  length  five  so  that  no  more  than  a  constant 
number  of  these  paths  congest  any  edge.  We  will  use 
two  similar  algorithms  to  accomplish  these  two  tasks. 

Let  Ap  and  sp  be  constants  (to  be  determined  later) 
which  depend  only  upon  the  probability  p  of  failure. 
Call  a  node  unsaturated  if  it  is  live  and  if  it  has  been  as¬ 
signed  to  simulate  fewer  than  Ap  of  its  neighbors.  Oth¬ 
erwise,  it  is  saturated. 

Throughout,  we  will  assume  some  fixed  ordered  label¬ 
ing  of  the  nodes.  The  most  convenient  one  is  the  lexico¬ 
graphical  order  of  the  labels.  The  assignment  algorithm 
proceeds  in  rounds.  During  a  round,  a  previously  un¬ 
saturated  node  might  be  picked  by  enough  unassigned 
nodes  so  as  to  exceed  its  capacity  AP.  In  such  a  case, 
we  require  the  node  to  accept  enough  of  the  simulation 
requests  to  saturate  it.  All  accepted  nodes  should  have 
lower  labels  than  those  which  are  rejected. 

Algorithm  2.1  performs  the  first  phase: 

for  i  s  1  to  sPn 

for  each  unassigned  node  w 

w  picks  one  of  its  neighbors  uniformly 
each  unsaturated  node  v  agrees  to  simulate  as 
many  nodes  as  it  can  without  exceeding  its  cap¬ 
acity,  giving  preference  to  lower  labeled  nodes 
all  excess  nodes  remain  un assigned 


Figure  1:  Algorithm  2.1 

Since  the  algorithm  never  assigns  a  saturated  node  to 


simulate  another  node,  no  node  simulates  more  than  Ap 
nodes.  Thus,  a  constant  load  embedding  results. 

To  facilitate  our  proofs,  we  will  first  formulate  a  se¬ 
quential  algorithm  similar  to  Algorithm  2.1.  We  will 
prove  that  this  new  algorithm  assigns  to  each  node  a 
neighboring  node  to  simulate  it.  We  will  then  show 
that,  except  for  a  small  proportion  of  executions,  the 
algorithms  behave  the  same. 

In  each  round  of  Algorithm  2.2,  unassigned  nodes  act 
sequentially.  Each  node  chooses  a  neighbor  to  simulate 
it  only  after  all  lower  labeled  nodes  have  chosen.  We 
would  like  to  ensure  that  all  nodes  have  a  large  number 
of  choices  that  will  result  in  a  successful  assignment.  Let 
atp  depend  only  upon  the  probability  p.  If  some  node 
tv  has  fewer  than  apn  unsaturated  neighbors  to  choose 
from  during  its  turn,  we  designate  an  arbitrary  set  of 
saturated  neighbors  as  dedtcated  to  w  during  its  turn.  If 
w  chooses  a  dedicated  node  during  that  particular  turn, 
the  dedicated  node  agrees  to  simulate  tv  even  though  it 
is  saturated.  We  dedicate  enough  nodes  so  that  w  has 
at  least  ctpn  neighbors  which,  if  chosen,  will  agree  to 
simulate  it. 

for  t  =  1  to  spn 

for  unassigned  nodes  w  in  lexicographic  order 
if  tv  has  fewer  than  apn  unsaturated  neighbors 
arbitrarily  dedicate  enough  (saturated) 
neighbors 

w  picks  one  of  its  neighbors  uniformly 
if  the  chosen  node  is  unsaturated  or  dedicated 
w  is  assigned  to  that  node 
else  tv  remains  unassigned 


Figure  2:  Algorithm  2.2 

We  will  show  below  that  with  high  probability  no 
nodes  are  ever  dedicated  during  Algorithm  2.2.  In  that 
case,  the  result  is  the  same  whether  unassigned  nodes 
choose  sequentially  or  in  parallel,  provided  preference 
goes  to  the  lower  labeled  nodes.  Thus  we  will  show  that 
Algorithms  2.1  and  2.2  produce  the  same  output. 

The  following  lemma  proves  that  Algorithm  2.2  ter¬ 
minates  quickly. 

Lemma  2.1.  With  high  probability  all  nodes  have  been 
assigned  after  spn  steps  of  Algorithm  2.2,  for  sufficiently 
large  sp. 

Proof.  Because  each  node  always  has  at  least  apn 
neighbors  which  will  simulate  it  if  chosen,  the  proba¬ 
bility  that  a  given  node  is  assigned  during  some  step  is 
at  least  ap,  independent  of  what  has  occurred  in  pre¬ 
vious  steps.  Thus  the  probability  that  a  node  remains 
unassigned  after  spn  steps  is  no  more  than  (1  -  ap)'»n. 
This  quantity  is  less  than  jfc  as  long  as  sp  >  ■£-.  g 


The  following  two  lemmas  show  that  with  high  prob¬ 
ability  Algorithm  2.2  never  dedicates  saturated  node^^' 
Thus  with  high  probability  Algorithms  2.1  and  2.2  b^^B 


have  identically.  This  proves  that  Algorithm  2.1  assign^ 
all  nodes  with  high  probability.  Similar  reasoning  proves 
the  Dance  Hall  Theorem  described  in  the  introduction. 


Lemma  2.2.  For  p  <  .16,  there  exists  an  cp  such  that 
with  high  probability  each  node  has  at  least  cpn  live 
neighbors. 


Proof.  The  probability  that  a  node  has  fewer  than  cn 
live  neighbors  equals 


<n  /  \ 

e 

i=0  '  ' 


Since  the  ratio  of  consecutive  terms  is  always  greater 


than  this  sum  is  bounded  by  a  constant  times  its 


last  term.  That  term  is 


The  second  term  in  the  product  can  be  made  less  than 
N~1~e  for  some  c  by  taking  e  small  enough.  The  first 
term  in  the  product  can  be  made  less  than  N  i  by  taking 
e  small  enough  as  well.  The  probability  that  some  node 
has  too  few  neighbors  is  bounded  by  the  sum  of  the 
probabilities  for  the  individual  nodes.  This  multiplie^^ 
the  above  bound  by  N.  Thus  for  any  c  below  both 
these  thresholds,  the  theorem  applies.  | 


Lemma  2.3.  Given  a  failure  ratep,  with  high  probabil¬ 
ity  a  given  node  v  never  has  fewer  than  ctpn  unsaturated 
neighbors  available  during  Algorithm  2.2,  for  ap  =  !f. 

Proof.  For  v  to  have  fewer  than  a pn  unsaturated 
neighbors  at  some  point  during  algorithm  2.2,  at  least 
(ep  —  Qp)n  =  apn  of  v’s  neighbors  must  have  become 
saturated  during  the  course  of  the  algorithm. 

Each  node  always  has  at  least  apn  neighbors  (includ¬ 
ing  dedicated  nodes)  to  which  it  might  be  assigned  dur¬ 
ing  any  step.  Further,  if  it  is  assigned,  it  is  equally  likely 
to  be  assigned  to  any  one  of  those  neighbors.  Thus  no 
node  has  a  probability  greater  than  ^  that  it  will  be 
assigned  to  any  given  neighbor,  no  matter  what  other 
assignments  have  been  made  previously. 

There  are  no  more  than  n2  nodes  which  might  be  as¬ 
signed  to  some  node  in  v’s  neighborhood.  The  probabil¬ 
ity  that  at  least  ctpn  of  v’s  neighbors  become  saturated 
is  thus  no  more  than 
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To  saturate  apn  of  v’s  neighbors,  there  must  be  at  least^^ 
n  nodes  at  Hamming  distance  two  from  v  each  ofMu 

1  IS  dssiffncd  to  A  nf  ii  TKp  fircl  fa/*i 


which  is  assigned  to  a  neighbor  of  v.  The  first  factor’ 
in  the  product  represents  the  number  of  ways  to  choose 
these  nodes.  Each  one  of  these  nodes  has  at  most  two 


neighbors  of  v  to  which  it  might  be  assigned.  Thus  the 
second  factor  upper-bounds  the  probability  that  each 
of  the  Ap apn  nodes  is  actually  assigned  to  a  neighbor 
of  v.  Although  the  probabilities  of  such  selections  are 
technically  dependent,  the  probability  a  given  node  is 
assigned  to  a  neighbor  of  v  is  at  most  ^2-,  no  matter 
what  choices  the  other  nodes  made. 

For  Ap  large  enough,  this  quantity  is  an  inverse  poly¬ 
nomial  in  N .  ■ 

Lemma  2.3  implies  that  with  high  probability  Algo¬ 
rithms  2.1  and  2.2  behave  identically.  We  know  that 
Algorithm  2.2  successfully  assigns  each  node  to  a  neigh¬ 
bor  with  high  probability  and  that  Algorithm  2.1  never 
assigns  more  than  Ap  nodes  to  any  node.  We  conclude 
that  Algorithm  2.1  achieves  a  constant  load  embedding 
with  high  probability. 

Once  we’ve  assigned  simulating  nodes,  we  need  to  find 
paths  to  simulate  the  edges  in  the  hypercube.  Say  that 
vb  simulates  v  and  i>44  simulates  u*.  Then  to  sim¬ 
ulate  the  edge  (v,vk),  the  nodes  vi  and  vkh  choose 
a  path  between  them  of  the  form  P(v,vk  ,b,b',r)  = 
(d4  ,  u4r,  vr ,  vrk,  vrki' ,  vki  ).  To  avoid  ambiguity,  we  will 
refer  to  the  choice  of  r  as  if  it  were  made  by  v  and  vk 
even  though  ib  and  r*4  actually  choose. 


Figure  3:  A  Choice  of  Live  Path 

For  two  adjacent  nodes  v  and  vk ,  let  S(v,vk  ,b,b')  be 
the  set  of  dimensions  r  ^  k  for  which  P{v,vk  ,b,b'  ,r) 
is  a  live  path.  Because  p  <  1  -  ^3,  there  is  a  chance 
(1  -  p)*  =  s  >  1  that  any  given  path  P(v,vk  ,b,b',r) 
is  live.  Note  that  the  paths  P{v,vk  ,b,b'  ,r)  (r  ^  k) 
are  node-disjoint  for  a  fixed  choice  of  v,vk,b  and  b'. 
Thus  the  probability  that  any  one  of  them  is  live  is 
independent  of  the  other  paths. 

Lemma  2.4.  With  high  probability,  for  all  quadruples 
(v,vk,b,b‘),  |5(v,  v*,6,6')|  >  rjpn  for  some  constant  qp. 

Proof.  Same  as  lemma  2.2,  except  that  there  are 
N  log3  N  different  quadruples.  | 

W’ith  high  probability,  we  know  that  all  pairs  of  neigh¬ 
bors  have  many  paths  from  which  to  choose.  What  re¬ 
mains  is  for  them  to  decide  in  a  systematic  but  local 
fashion  how  to  choose  from  among  these  paths  without 


congesting  any  edge  too  much.  In  the  rest  of  this  sec¬ 
tion,  we  explore  a  way  to  choose  paths  in  this  manner. 

Take  a  node  v  simulated  by  its  neighbor  v4  and  con¬ 
sider  the  set  £„>  of  edges  {(v4r,vr)}.  There  are  2n2 
nodes  u;  (all  of  the  form  w  =  vrl  or  w  =  vkrt)  which 
(like  u)  might  potentially  use  one  of  the  edges  in  the  set 
as  a  second  edge  along  a  path.  Any  node  which  actually 
does  must  be  simulated  by  its  neighbor  across  dimension 
6.  The  next  lemma  bounds  the  number  of  such  nodes. 

Lemma  2.5.  For  sufficiently  large  6P  and  with  high 
probability,  of  the  2 nJ  nodes  at  distance  0  or  2  from 
either  v  or  vk,  no  more  than  6pn  of  them  are  simulated 
by  neighbors  across  dimension  b. 

Proof.  As  noted  before,  each  node  has  a  probability 
no  more  than  0f  borrowing  across  any  given  dimen¬ 
sion,  regardless  of  the  choices  made  by  other  nodes.  The 
probability  that  many  nodes  choose  across  the  same  di¬ 
mension  is  no  more  than 


Of  course,  the  actual  probabilities  depend  on  the  par¬ 
ticular  bpn- size  subset  we  consider  and  on  the  relative 
order  in  which  the  nodes  of  the  subset  successfully  found 
neighbors  to  simulate  them.  Then  any  node’s  probabili¬ 
ties  are  conditioned  upon  other  nodes'  previous  choices 
No  matter  how  these  choices  are  made,  however,  the 
stated  probabilities  are  upper  bounds  on  the  actual 
probabilities  since  when  each  node  chooses  it  always  has 
at  least  ctpn  choices. 

For  sufficiently  large  6p,  this  is  smaller  than  an  inverse 
polynomial  in  N.  | 

Each  of  the  at  most  6pn  nodes  (except  for  r  and  v4) 
can  use  at  most  two  edges  in  the  set  Ev as  a  second 
edge  along  some  path.  To  use  an  edge  as  a  second  edge, 
such  a  node  would  have  to  be  a  neighbor  of  one  of  the 
nodes  incident  to  the  edge.  If  w  is  of  the  form  w  = 
vTt,  then  u;  is  adjacent  to  vr  and  v *  and  no  other  node 
incident  to  an  edge  in  £„>.  Similar  reasoning  applies  to 
nodes  w  which  satisfy  w  —  v6rt .  Trr  illy,  each  of  v  and 
vb  can  use  no  more  than  n  edges  of  £„,»  as  a  second  edge 
along  some  path.  If  we  sum  over  all  edges  in  £v  t  the 
number  of  nodes  which  can  use  each  edge  as  a  second 
edge  counting  according  to  multiplicity,  the  total  will 
be  no  more  than  (2 6P  +  2)n.  Therefore  no  more  than 

n  of  these  edges  will  have  more  than  yp  =  0f 

those  6pn  nodes  potentially  using  them  as  second  edges. 
Let  S'(v,6)  =  {r|  more  than  yp  nodes  can  send  a  path 
through  the  edge  (v4r,vr)}.  Then  |5'(v, 6)|  <  ^fn. 

Let  T(v,vk,b,b')  =  S(v,  v* ,  6,  b')-S'{v,  b)~S'{vk ,  b1). 
Then  for  each  adjacent  pair  of  nodes  v  and  vk , 
|T(u,  vk ,  b,  6')|  >  T*n.  The  sets  T(v,  vk ,  6, 6')  will  be  cru¬ 
cial  to  our  reasoning.  The  probability  that  a  pair  suc¬ 
cessfully  choose  a  path  between  them  is  lower  bounded 
by  the  probability  that  they  successfully  choose  the  path 
from  T( r,  r* ,  b,  6')- 


Note  that  among  the  edges  in  all  the  paths  repre¬ 
sented  by  the  sets  T(v,vk  ,b,b'),  there  are  now  only  a 
logarithmic  number  of  quadruples  (w,tv>,  c,<^)  which 
might  potentially  congest  any  given  edge.  We’ve  al¬ 
ready  limited  the  number  of  paths  for  which  the  edge  is 
the  second  edge  along  the  path.  If  the  edge  is  the  first 
edge  along  the  path,  then  one  of  the  edge’s  endpoints 
is  the  simulating  node.  Each  endpoint  simulates  only 
a  constant  number  of  nodes,  and  each  simulated  node 
contributes  exactly  n  paths.  If  the  edge  is  the  third  edge 
along  the  path,  then  the  path  is  simulating  an  edge  at 
Hamming  distance  one  from  the  edge  considered.  There 
are  exactly  n  edges  of  this  type.  The  cases  in  which  the 
edge  is  the  fourth  or  fifth  edge  along  the  path  are  identi¬ 
cal  to  the  first  two  cases.  Thus  each  edge  can  be  poten¬ 
tially  congested  by  no  more  than  ppn  =  (4Ap-t-27p-|-l)n 
paths. 

We  can  now  describe  Algorithm  2.3,  which  assigns 
paths  to  simulate  edges.  During  Algorithm  2.3,  each 
edge  will  decide  w'hether  or  not  to  accept  some  path 
routed  through  it.  Because  the  other  edges  in  the 
path  simultaneously  decide  whether  or  not  to  accept 
the  path,  it  is  possible  that  some  might  accept  it  while 
others  reject  it.  If  this  happens,  we  assume  that  an 
accepting  edge  counts  the  path  as  contributing  to  its 
load  anyway.  Call  an  edge  saturated  if  it  has  accepted 
exactly  Bp  paths  routed  through  it.  Otherwise,  call  it 
unsaturated.  Order  the  pairs  (u,t>*)  lexicographically. 
As  before,  in  any  round  an  edge  accepts  the  lowest  or¬ 
dered  pairs  which  try  to  route  through  it  until  it  reaches 
its  capacity. 

for  «  =  1  to  s'pn 

for  each  unassigned  adjacent  pair  of  nodes  (v,  vk) 
(v,  v*)  pick  a  path  between  them  uniformly 
each  unsaturated  edge  agrees  to  as  many 
paths  routed  through  as  it  can  without 
exceeding  its  capacity,  giving  preference 
to  lower  labeled  pairs 
all  excess  pairs  remain  unassigned 


Figure  4:  Algorithm  2.3 

Parallelling  what  we  did  before,  we  will  present  Al¬ 
gorithm  2.4,  a  sequential  version  of  Algorithm  2.3.  We 
will  show  that  this  modified  algorithm  terminates  hav¬ 
ing  assigned  paths  between  every  pair  of  nodes  simulat¬ 
ing  neighbors,  with  high  probability.  Maintaining  the 
parallel  with  what  we  proved  earlier  in  this  section,  we 
will  then  show  that  the  two  algorithms  perform  indis- 
tinguishably,  with  high  probability.  At  any  time  when 
the  pair  {v,  vk)  attempt  to  choose  a  path  between  them 
during  Algorithm  2.4.  let  U(v.  vk  ,b,b')  be  the  subset  of 
T(v,vk ,b.b')  consisting  of  dimensions  r  for  which  all  of 
the  edges  along  P(v,  vk ,  6, 6',  r)  are  unsaturated.  Define 


the  dedication  of  a  path  containing  a  saturated  edge  in  a 
fashion  similar  to  the  dedication  of  saturated  neighbor^ 
before.  We  dedicate  paths  to  the  pair  (v,vk)  whenev<fl 
there  are  not  0pn  choices  for  a  simulating  path. 

for  *  =  1  to  s'pn 

for  all  unassigned  pairs  (v,vk)  in  order 
if  \U(v,vk ,b,b')\  <  0pn 

dedicate  enough  r  €  T(v,  vk ,  6, 6') 

(v,vk)  pick  a  path  between  them  uniformly 
if  the  chosen  path  is  unsaturated  or  dedicated 
(v,v*)  is  assigned  to  the  path 
else  (v,  t>k)  remains  unassigned 


Figure  5:  Algorithm  2.4 

Lemma  2.6.  For  a  suitably  large  choice  of  the  constant 
sp ,  with  high  probability  all  pairs  of  nodes  searching  for 
an  assignment  to  a  path  have  been  assigned  one  after 
s'pn  steps  of  Algorithm  2.4. 

Proof.  Each  pair  is  successfully  assigned  with  proba¬ 
bility  at  least  0P  during  any  step.  The  rest  of  the  proof 
is  identical  to  that  of  lemma  2.1.  ■ 

We  now  show  that  with  high  probability  Algorithm 
2.4  never  adds  dedicated  paths  with  saturated  edge^^^ 
to  any  U{v,  vk,  6, 6').  Thus  with  high  probability  Al-^P 
gorithms  2.3  and  2.4  behave  identically.  This  proves 
that  Algorithm  2.3  assigns  all  necessary  paths  with  high 
probability. 

Wre  will  need  the  following  general  bound  on  the  sums 
of  random  variables  from  [S] . 

Lemma  2.7.  Let  {Vj}  be  independent  random  vari¬ 
ables  with  Pr[Yt  =  1]  =  4>t  and  Pr[Yt  =  0]  =  1  -  <£,. 

Set  Y  =  £2,  V,  and  <j>  =  4>,.  Then 

Pr[Y  >4,4 -a]  <  e"0’ /J*+a’ 

Lemma  2.8.  With  high  probability  no  set 
U(v,  vk,b,  b')  ever  has  cardinality  less  than  0pn  at  the 
beginning  of  some  step  of  Algorithm  2-4,  given  0P  = 

Proof.  There  are  at  most  pp  n  pairs  which  have  a  non¬ 
zero  probability  of  congesting  a  given  edge  on  some 
path  represented  by  an  r  €  T(v,vk ,b,b').  Thus  at  most 
?  bppn7  pairs  have  non-zero  probability  of  congesting  any 
of  those  edges,  counting  according  to  multiplicity.  For  a 
path  to  leave  U(v,  vk  ,b,  b’)  one  of  its  edges  must  become 
saturated.  For  (t f-  —  0p)n  =  0pn  paths  to  become  un¬ 
available,  Bp0pn  pairs  must  choose  a  path  crossing  an 
edge  on  some  path  represented  by  an  r  £  T(v,  vk,  6, 6'). 

The  probability  that  a  pair  chooses  any  particular 
path  is  at  most  no  matter  what  other  choices  are 
made.  Thus  if  there  are  qWiW,  paths  that  a  particu¬ 
lar  pair  (u i,  up)  might  choose  which  contain  an  edge 


on  some  path  in  T(v,vk  ,b,b'),  then  the  probability 


that  (u>,  ui3 )  chooses  such  a  path  is  at  most 

Qw.uii  ^  5/ipn  . 


and 


Then,  by  lemma  2.7,  the  probability  that  more  than 
0pn  paths  become  unavailable  is  therefore  no  more  than 


% 


which,  for  large  enough  Bp,  is  smaller  than  an  inverse 
polynomial  in  AT .  g 

With  high  probability  O(n)  steps  are  sufficient  to  se¬ 
lect  all  paths.  Since  we  have  guaranteed  that  the  paths 
have  constant  congestion,  this  proves  the  following  the¬ 
orem. 


Theorem  2.9.  For  each  p  <  1  —  \/~b  (about  .16)  there 
is  an  O(logN)  step  algorithm  such  that  if  each  of  the 
nodes  of  an  N-node  hypercube  fails  with  probability  p 
then  with  high  probability  the  algorithm  finds  an  em¬ 
bedded  fully  functioning  N-node  cube  with  constant 
load,  dilation  and  congestion.  The  paths  which  sim¬ 
ulate  the  edges  of  the  cube  only  use  live  nodes. 


Edge  faults  are  easily  handled  once  node  faults  are 
understood.  Say  each  edge  fails  with  probability  pe, 
each  node  fails  with  probability  pn  and  the  failure  of 
any  component  is  independent  of  the  failure  of  other 
components.  Then  the  results  of  this  section  follow  with 
little  change.  Specifically,  as  long  as  p„  +  pe  -  p„pe  < 
1  —  v^5  (about  .13),  the  algorithms  of  those  sections 
work  with  high  probability.  The  only  addition  to  our 
reasoning  is  that  when  one  node  tries  to  communicate 
with  a  neighbor  node,  it  is  unsuccessful  not  only  if  the 
neighbor  is  faulty  but  also  if  the  link  between  them  has 
failed. 

Another  extension  of  this  approach  is  to  the  case  of 
arbitrarily  large  constant  probabilities  of  failure.  By 
showing  that  most  local  areas  of  the  cube  retain  a  good 
structure  even  when  each  node  fails  with  probability 
very  close  to  1,  we  can  prove  that  constant  delay  simu¬ 
lations  are  always  possible. 


Theorem  2.10.  Say  each  node  of  an  N-node  hyper- 
cube  fails  independently  and  with  constant  probability 
p  <  1.  Then  with  high  probability,  the  faulty  hyper¬ 
cube  can  simulate  a  completely  functioning  N-node  hy¬ 
percube  with  only  constant  slowdown. 


In  the  preceding  discussion,  we  ignored  several  de¬ 
tails  of  implementation.  For  example,  we  assumed  that 
faulty  nodes  could  participate  in  the  algorithm  to  search 
for  nodes  to  simulate  them.  In  reality,  other  nodes  must 
participate  for  the  faulty  nodes.  Also,  much  informa¬ 
tion  must  be  exchanged  by  the  various  participants  in 
the  algorithm.  It  can  be  shown  how  to  implement  this 
algorithm  using  polvlogarithmic  time  per  step. 


3  Fast  Routing  Around  Faults 

In  this  section  we  examine  the  problem  of  routing  a  per¬ 
mutation  on  a  faulty  hypercube.  We  describe  a  variant 
of  Valiant-Brebner  routing  on  the  hypercube  that  we 
call  offset  routing.  In  order  to  present  our  ideas  more 
easily,  we  first  review  some  basic  ideas  from  Valiant  and 
Brebner’s  results. 

The  butterfly  is  obtained  from  the  hypercube  by 
replacing  each  node  v  of  the  cube  by  a  cycle 
(t>o,  tii, . . .,  t>„,  vo).  We  replace  each  edge  (t?,  v* )  by  a 
pair  of  edges  (v,_i,vj)  and  (vj—1 ,  w»).  We  can  visualize 
the  set  of  nodes  {t),|t>  €  Hn}  as  sharing  a  level  of  the 
butterfly.  We  call  edges  of  the  form  ( ti, _  t ,  v, )  straight 
edges  and  those  of  the  form  (v,_i,  t/j)  cross  edges.  All 
edges  connect  nodes  in  adjacent  levels  or  level  n  to  level 
0. 

We  view  our  hypercube  routing  as  if  it  took  place  on 
the  butterfly.  A  packet  starts  at  some  node  v0  and  ends 
at  some  node  u>„.  We  think  of  the  column  of  nodes 
{t>,}  as  shared  by  the  hypercube  node  v,  which  assigns 
each  node  in  the  column  a  different  queue  from  a  set 
of  n  queues.  If  a  message  traverses  the  straight  edge 
( _ i ,  v» )  in  some  butterfly  step,  then  it  is  passed  from 
the  node  v’s  (i  —  1)*‘  queue  to  its  tth  queue  in  the  hy¬ 
percube  step.  If  the  message  traverses  the  cross  edge 
(tij_i,v-)  in  some  butterfly  step,  then  it  is  passed  from 
v's  (i  —  l)Jt  queue  to  v”s  ith  queue  in  the  hypercube 
step.  In  the  remainder  of  the  paper,  we  will  view  the 
routing  algorithms  as  if  they  take  place  on  the  butterfly, 
although  the  proofs  we  give  will  bound  performance  on 
the  hypercube  as  well. 

Routing  from  vo  to  w„  is  simplified  by  the  fact  that 
there  is  a  unique  path  of  length  n  between  those  two 
nodes.  The  i,h  step  in  the  path  connects  a  node  at 
level  i  —  1  with  one  at  level  i.  If  v  and  w  agree  in  the 
t1h  bit,  the  edge  is  a  straight  edge.  If  they  differ,  a 
cross  edge  is  used.  For  example,  to  route  from  the  node 
(1,1, 0)o  to  the  node  (0, 1 , 1)3  we  would  use  the  path 

(l.l,0)o,(0,l,0)1,(0,l,0)a,(0.1,I)3. 

In  the  first  phase  of  the  Valiant-Brebner  routing  al¬ 
gorithm,  each  node  in  level  0  first  sends  its  packet  to 
a  random  node  in  the  n,h  level  using  the  unique  path 
of  length  n.  From  there  the  packet  is  passed  across  the 
straight  edge  to  the  level  0  node  and  then,  in  the  second 
phase,  it  is  routed  along  the  unique  path  to  its  true  des¬ 
tination.  In  [VB]  it  was  shown  that  this  algorithm  takes 
O(n)  steps  to  complete  and  uses  total  queue  length  O(n) 
at  every  hypercube  node,  with  high  probability. 

In  the  offset  routing  algorithm,  each  packet  remains 
fairly  close  to  where  the  Valiant-Brebner  scheme  would 
send  it.  Its  location  always  differs  from  where  their  al¬ 
gorithm  would  send  it  by  some  offset  which  is  a  random 
dimension.  Its  ability  to  move  easily  among  the  offset 
nodes  will  be  enhanced  by  the  addition  of  “jump"  edges 
between  the  nodes  on  a  given  butterfly  level.  These 
ideas  will  be  made  more  concrete  in  what  follows. 


A  jump  edge  is  an  edge  of  the  type  (tij,t)]).  Jump 
edges  are  not  butterfly  e J  •s.  A  packet  traversing  such 
an  edge  would  be  sent  (  he  hypercube)  from  the  j,h 
queue  of  v  across  the  edge  (t>,t>*)  and  deposited  in  the 
jiK  queue  of  v'.  Note  that  all  n  jump  edges  of  the  type 
(vj ,  vj)  ( j  varying)  are  actually  manifestations  of  a  sin¬ 
gle  hypercube  edge  from  v  to  v‘  .  This  means  that  every 
hypercube  edge  is  represented  n  +  2  times  in  the  but¬ 
terfly  with  jump  edges,  as  n  different  jump  edges  and  2 
cross  edges. 

Call  the  path  traversed  by  a  packet  in  the  Valiant- 
Brebner  scheme  its  virtue/ path.  In  the  offset  routing  al¬ 
gorithm,  a  packet  whose  virtual  path  would  go  through 
the  node  Vk-i  will  pass  instead  through  some  node  v]c_1. 
If  its  virtual  path  would  leave  vt_i  via  a  straight  edge, 
then  the  offset  path  will  traverse  three  edges  of  the  type 
(vt-i>  vt-i> t'tJ > *'*)•  ^  finc*s  suc^  a  Pat^  ky  randomly 
choosing  a  dimension  j  ^  i  and  attempting  to  route 
across  the  appropriate  three  edges.  If  the  packet  en¬ 
counters  a  fault  in  any  of  the  edges  or  nodes  along  those 
three  edges,  it  returns  to  the  node  Vj_lt  which  chooses 
another  random  dimension  and  tries  again.  Note  that 
this  means  that  a  packet  might  have  to  traverse  many 
more  than  three  edges  to  pass  from  one  level  to  the 
next.  If  the  virtual  path  would  leave  vt-i  via  a  cross 
edge,  then  the  offset  path  traverses  three  edges  of  the 
type  t'fc  *.  v**)  instead.  Note  that  no  mat¬ 

ter  whether  straight  edges  or  cross  edges  are  used  in  the 
virtual  path,  the  node  ends  with  a  random  offset  j  from 
its  virtual  location. 

To  begin,  the  message  generated  by  node  v  repeat¬ 
edly  chooses  random  dimension  j  and  attempts  to  route 
across  the  edge  (v0,Vq)  until  it  successfully  finds  an  ini¬ 
tial  offset.  Say  that  the  message  reaches  the  n,h  level 
for  the  second  time  with  offset  i  (i.e.  it  reaches  the  node 
u-J,).  Then  to  conclude,  the  message  finds  an  offset  j  for 
which  the  path  (u>},,  w’j ,  ti^,,  u;n)  is  fault-free. 

We  will  prove  that  the  total  length  of  every  offset  path 
is  O(n)  and  that  every  packet  traverses  its  offset  path 
in  0(n)  steps,  with  high  probability.  First  we  show  that 
few  messages  ever  cross  any  given  small  set  of  edges. 

Lemma  3.1.  Take  an  arbitrary  set  of  h  edges  on  one 
level  of  the  butterfly.  Then  with  high  probability  the 
Valiant-Brebner  routing  scheme  routes  only  0(h  -f  n) 
messages  through  edges  jn  the  set. 

Proof.  Note  that  each  message  can  congest  at  most 
one  edge  in  the  set.  The  following  an  Jysis  applies  to 
the  first  phase  of  the  routing  algorithm.  The  analysis 
for  the  second  phase  is  almost  identical. 

Say  the  edges  share  level  /  of  the  butterfly.  Then  we 
can  partition  the  butterfly’s  first  /  levels  into  jr  nonin¬ 
tersecting  butterflies  B\ ,  B 2, ....  By  each  built  from  a 

subcube  with  2(  nodes.  For  a  message  to  route  through 
one  of  the  h  edges,  it  must  start  in  the  same  butterfly 
as  the  edge.  Say  that  h,  of  the  edges  lie  in  butterfly  f?,  . 
Then  each  message  starting  in  butterfly  Bi  has  proba¬ 


bility  pi  =  that  it  will  hit  one  of  the  edges.  The  sum 
over  all  messages,  of  the  probabilities  of  hitting  one 
the  edges  is  £,2'^-  =  /»,•  =  h.  From  lemma  2. 

the  probability  that  more  than  h  +  a  messages  route 
through  the  edges  is  less  than  e*p(—  fjj(l  —  $r)). 

If  h  >  n,  then  the  probability  that  more  than  (k  + 

1  )h  messages  pass  through  the  edges  is  less  than  N~  V 
Similarly,  if  h  <  n,  the  chance  of  having  more  than 

|3 

(lfc+l)n  messages  crossing  the  set  is  also  less  than  N~*~ 

m 

Lemma  3.2.  Take  an  arbitrary  set  of  0(n3)  edges  in 
the  butterfly,  i  >  2.  Then  with  high  probability  the 
Valiant-Brebner  routing  scheme  routes  only  0(n3)  mes¬ 
sages  through  edges  in  the  set,  counting  according  to 
multiplicity. 

Proof.  Examine  each  level  separately.  With  high  prob¬ 
ability  if  level  /  has  ei  edges  from  the  set,  then  no 
more  than  0(ei  +  n)  messages  will  traverse  an  edge 
from  the  set  at  that  level.  Summing  over  all  lev 
els,  we  get  that  with  high  probability  the  number  of 
messages  crossing  edges  from  the  set  is  no  more  than 

OQ>i  +  "2)  =  0("3)-  I 

Consider  a  hypercube  with  faulty  nodes,  the  butterfly 
derived  from  the  cube  and  a  particular  node  t/t_  t  in  the 
butterfly.  We  will  need  to  know  that  such  a  node  has 
ample  opportunity  to  send  a  packet  to  the  next  level 
Consider  a  path  PVk.l,ij  =  vJLi,  v*jk,  rj*).  W 

assume  that  a  message  has  successfully  arrived  at  v‘t_, 
and  so  there  are  six  components  -  three  nodes  and  three 
edges  -  in  the  path  that  must  all  work  properly  If  the 

probability  of  failure  is  less  than  1  -  and  the  faults 
are  independent,  then  each  such  path  has  probability 
less  than  i  that  it  has  a  faulty  component.  For  sub 
sequent  analysis,  we  would  like  it  to  be  the  case  that 
for  all  pairs  v*-i,«  there  are  at  least  Cpn  dimensions  j 
which  lead  the  message  on  a  functioning  path  PVk_lti 
We  would  also  like  to  know  that  at  least  (pn  of  the  paths 

P»*~i.ij  =  (v»-i>  wt-i>  vk  >  tic)  are  fault-free  for  all  pairs 
v*_i,  i.  We  also  need  room  to  begin  and  end  the  routing 
That  is,  we  need  that  ( pn  of  the  paths  Qy„,i  =  (r0,vj) 
and  of  the  paths  Q'Vn,ij  =  (v),,  v'j ,  v£,  u„)  contain 
no  faults.  If  so  many  paths  are  available  to  all  nodes  we 
say  the  butterfly  is  locally  routable. 

Lemma  3.3.  If  the  probability  that  any  component 
fails  is  less  than  1  -  and  all  failures  occur  indepen 
dently,  then  with  high  probability  the  butterfly  is  locally 
routable. 

Proof.  The  set  of  paths  available  at  any  time  are  node 
disjoint.  Thus  the  faultiness  of  any  path  is  independent 
of  other  paths  in  the  set.  The  proof  is  therefore  similar 
to  that  of  lemma  2.2.  ■  I 

Lemma  3.4.  Say  a  butterfly  has  faulty  components 
but  is  locally  routable.  With  high  probability  each  mes¬ 
sage  in  the  offset  routing  traverses  a  path  of  length  O(n). 


Proof.  We  will  prove  that  any  given  message’s  path  is 
of  length  O(n)  with  high  probability.  Since  there  are 
only  N  messages,  this  will  imply  the  lemma.  Assume 
that  at  some  point  in  its  route,  the  packet  is  at  the 
node  t>‘,  where  v  is  the  node  it  would  traverse  in  the 
Valiant-Brebner  scheme.  Assume  as  well  that  the  packet 
is  scheduled  to  traverse  dimension  k.  (If  the  straight 
edge  is  to  be  used  or  if  the  packet  is  at  the  beginning  or 
end  of  the  route,  the  analysis  is  identical.)  Then  if  the 
packet  successfully  chooses  to  jump  across  dimension 
j,  the  path  *,«*)  must  have  no  faults. 

Since  the  butterfly  is  locally  routable,  £pn  of  the  possible 
paths  to  choose  are  fault-free.  If  a  faulty  path  is  chosen, 
no  more  than  six  steps  are  necessary  to  encounter  the 
fault  and  to  return  to  t*_i  ■  Since  a  random  dimension  is 
chosen  at  each  step,  the  probability  that  a  packet  takes 
more  than  6a(2n  +  2)  steps  is  less  than  the  probability  of 
at  least  (a  —  l)(2n  +  2)  heads  in  a  sequence  of  a(2n  +  2) 
tosses  of  a  coin  with  probability  (p  of  landing  tails.  This 
probability  is  less  than 

/a(2n  +  2)\  .(„-i)(2n+2) 

V  2n  +  2 

<  2n+2  (1  -  Cr)(a""1)(2n+21 

<  (ea(l  —  Cj>)a-1)2n+2 

an  inverse-  polynomial  in  N  for  large  enough  a.  | 

Now  that  we  know  each  message  moves  a  distance 
of  O(n)  during  an  offset  routing  phase,  we  need  to 
show  that  its  forward  movement  is  delayed  by  at  most 
0{n)  other  packets.  These  facts  together  will  bound  the 
packet’s  time  to  its  destination.  We  will  show  that  few 
other  packets  choose  virtual  paths  in  such  a  way  that 
they  have  a  non-zero  probability  of  selecting  an  offset 
path  which  congests  a  given  node’s  path.  We  will  then 
show  that  even  fewer  of  those  actually  congest  the  path 
when  they  use  offset  paths.  Note  that  a  hypercube  edge 
traversed  by  a  given  message  may  be  traversed  by  other 
messages  as  either  cross  edges  or  jump  edges.  We  con¬ 
sider  these  two  cases  separately. 

Lemma  3.5.  Consider  a  set  £  of  O(n)  hypercube  edges 
and  butterfly  straight  edges.  Let  S  be  the  set  of  butter¬ 
fly  edges  such  that  any  packet  whose  virtual  path  crosses 
an  edge  in  S  has  a  non-zero  probability  of  congesting  an 
edge  in  E  as  a  butterfly  edge  in  its  offset  path.  Then 
with  high  probability,  there  are  0(n3)  packets  whose 
virtual  paths  traverse  any  of  the  edges  in  S,  counting  a 
packet  several  times  if  it  traverses  several  edges  in  S. 

Proof.  If  (wi_i,u'{)  is  a  butterfly  edge  traversed  by  a 
packet's  offset  path  then  the  packet's  virtual  path  must 
use  an  edge  of  the  form  ( u.'/ ,  )  for  some  pair  i,  j 

There  are  only  n2  such  pairs.  The  same  reasoning  would 
hold  if  the  edge  in  question  were  a  straight  edge.  Since 
|£j  =  O(n),  |S|  =  0(n3).  By  lemma  3.2,  only  0(n3) 
packets  traverse  edges  in  S,  with  high  probability,  g 


Lemma  3.6.  Let  T  be  the  set  of  butterfly  edges  such 
that  any  packet  whose  virtual  path  crosses  an  edge  in 
T  has  a  non-zero  probability  of  congesting  some  edge 
in  E  as  a  jump  edge  in  its  offset  path.  Then  with  high 
probability,  there  are  0(n3)  packets  whose  virtual  paths 
traverse  any  of  the  edges  in  T,  again  counting  according 
to  multiplicity. 

Proof.  Say  ( wt ,  u/Jj )  is  a  jump  edge  traversed  by  a 
packet.  Let  Pv>,  j  or  P'Vk'ij  be  the  path  used  by  the 
packet  when  it  traverses  the  jump  edge.  Then  (w,w‘) 
is  either  the  first  or  the  last  edge  traversed  in  the  path. 
If  it  is  the  first,  then  wt  =  v*,  and  therefore 

/  =  j.  The  edge  traversed  in  the  virtual  path  would 
have  been  (t>t,t>*+1)  or  for  some  k.  There 

are  n  choices  for  v  such  that  v*  =  w\  and  n  choices 
for  k.  Thus  there  are  only  0(n2)  elements  of  T  whose 
traversal  in  some  packet’s  virtual  path  gives  the  packet 
a  non-zero  probability  of  traversing  the  edge  (ui,u/)  as 
a  jump  edge.  The  same  reasoning  holds  for  use  of  the 
jump  edge  as  a  third  edge.  Again,  since  |£|  =  O(n), 
ITI  =  0(n3).  By  lemma  3.2,  only  0(n3)  packets  traverse 
edges  in  T,  with  high  probability.  | 

Lemmas  3.5  and  3.6  also  hold  for  the  set  of  edges  in¬ 
cident  to  the  set  of  nodes  {vt}  for  some  hypercube  node 
v.  If  we  bound  the  number  of  packets  congesting  these 
edges  then  we  bound  the  number  of  packets  ever  resid¬ 
ing  in  queues  in  the  node  v  (the  queuesize  of  v).  The 
following  two  technical  lemmas  help  bound  how  much 
congestion  actually  results  from  the  possible  sources. 

Lemma  3.7.  Consider  a  set  of  nonnegative  integers 
{or,|l  <r<z,l<s<  <r(r)}  where  <r(r)  >  £pn  for  all 
r,  I2rt  ar«  <  Cf*3  *nd  Or,  <  n  for  all  pairs  r.s.  If  exactly 
one  index  sr  is  chosen  uniformly  in  [l,<r(r)]  for  each 
index  r  then  with  high  probability  ]Tr  ar,,  =  0(n2). 

Proof.  Let  Xr  =  ay*,-  We  wish  to  bound  the  value  of 
X  —  Ylr  Xf  To  do  so,  we  bound  the  moment  gener¬ 
ating  function  M(A)  =  £(eAX).  We  can  then  bound 
Pr[X  >  bn1]  =  Pr[exx  >  eA*n’]  <  e-A*"’£(eAX). 
This  bound  directly  follows  from  Markov’s  inequal¬ 
ity.  We  will  first  bound  the  moment  generating  func¬ 
tions  Mr( A)  =  £(eAXr)  =  eXar' ■  'Ve  can 

then  use  the  fact  that,  since  the  Xr  are  independent, 

A/(A)=rWA). 

If  we  could  find  art  <  ctry  and  a  positive  integer  b 
such  that  0  <  arr  —  6,  ary  +  b  <  n  then  by  transfer- 
ing  b  units  this  way  we  could  only  increase  M,(X)  (for 
positive  A).  This  follows  because  eA°r'  —  eXa',~t  — 
eM< —  l)  <  eAo'»(eA<  —  1)  =•  eA*0,■»+,*  —  eAo-* 
By  this  reasoning,  if  Ar  =  Yl,  a<-»  is  nxed,  we  maximize 
Mr(A)  by  setting  all  terms  except  possibly  one  equal  to 
either  0  or  n.  Thus 

f  +  o{r)  -  1)  if  Ar  <  n 

1  ^(f^le^  +  crfrl-r^l)  if  Ar  >  n 


£(eAX")  < 


For  the  rest  of  the  proof  we  fix  A  =  i.  If  Ar  <  n 
then  Mr(±)  <  ^(e"^  +  ff(r)  ~  1)  <  ^7y(l  +  + 

<r(r )  -  1)  <  1  +  (The  second  inequality  uses  the 
fact  that  for  0  <  7  <  1,  t1  <  1  +  27.)  If  A,  >  n  then 
Mr(k)<^(2^e  +  a(r))<l  +  ^. 

In  either  case  the  bound  is  at  most  1  4-  < 

e*|Kgi #>•  Thus  Af(l)  <  n,"P(g5f)  <  Con¬ 

tinuing  the  reasoning  of  the  first  paragraph  of  the  proof, 
Pr[X  >  6n2J  <  t~hnN^r  .  We  can  make  this  probabil¬ 
ity  an  arbitrarily  large  negative  power  of  N  by  letting  b 
be  a  large  constant.  | 

Lemma  3.8.  Consider  a  set  of  nonnegative  integers 
{ i?r  1 1  <  r  <  z}  where  53r  0r  =  0(n2)  and  0T  =  O(n) 
for  all  r.  Let  {gr}  be  a  set  of  random  variables  with 
geometric  distributions  gf  ~  G((p)  (i.e.  gr  =  a  with 
probability  (p(l  —  Cp)”'1)  Then  with  high  probability, 
Hr9r0 r  =  0(n2). 

Proof.  Order  the  integers  by  increasing  size  0\  <  0i  < 
...</?*.  Then  since 

0kn  + 1 ,  Pkn+  . 0(k  +  l)n-l 

are  all  at  least  as  large  as  we  know  that 

0kn  =  O(n).  We  assume  that  0t  =  O(n),  so 

the  sum  0t  +  53l={  0kn  =  O(n). 

Now  with  high  probability,  all  sums  53?si  gtn+ r  are 
O(n).  We  know  that 

n 

Yl9rl3r  -  Skn+r)0(k  +  l)n 

r  k  r  =  l 

Thus,  with  high  probability,  Yir9r0r  =  0(n2).  | 

Theorem  3.9.  If  we  route  using  offset  routing  and  the 
hypercube  is  locally  routable,  then  with  high  probabil¬ 
ity,  all  packets  are  delivered  in  0(log  N)  steps  and  all 
nodes  have  total  queuesize  0(log  N). 

Proof.  Focus  on  the  path  po  of  a  particular  message 
mo.  We  will  show  that  the  congestion  along  po  from 
various  sources  is  O(n)  with  high  probability. 

Lemmas  3.5  and  3.6  bound  the  number  of  messages 
which  have  the  potential  to  congest  an  edge  of  mo’s 
path  while  passing  between  levels  on  their  own  paths. 
Enumerate  the  packets  mi ,  m2, . . . ,  m,  which  have  a 
non-zero  probability  of  congesting  po  while  traversing  an 
edge  from  an  even  level  to  an  odd  level  in  their  virtual 
paths.  A  particular  packet  may  appear  several  times  in 
the  enumeration  -  once  for  each  even  level  node  along 
its  virtual  path  from  which  it  might  congest  an  edge  of 
Po- 

The  packet  mr  has  at  least  £pn  paths  which  would 
successfully  route  it  to  the  next  level.  Arbitrarily  des¬ 
ignate  exactly  Cpn  of  these  paths  as  special.  For  the 
purposes  of  our  analysis,  we  require  mr  to  choose  a  spe¬ 
cial  path  before  we  allow  it  to  route  to  the  next  level. 


This  can  only  increase  the  amount  of  congestion  placed 
on  any  edge,  since  it  increases  the  number  of  attempt^^ 
made  by  each  packet.  However,  once  mr  does  choose^^A 
special  path,  we  always  place  it  in  the  last  node  of  tn^^ 
first  fault-free  path  it  found.  Thus  mr  winds  up  in  the 
same  place  on  the  next  level  as  if  no  special  requirements 
had  been  made. 

Consider  the  choice  of  offsets  made  by  the  message 
m,  at  even  level  lr.  Let  qr  be  the  number  of  choices 
of  pairs  of  offset  dimensions  (i,j)  for  the  message  mr 
which  would  congest  an  edge  in  m0’s  path.  Then  53  qr  = 
0(n3)  by  lemmas  3.5  and  3.6.  (53  9r  is  a  second  way 
to  count  the  number  of  edges  in  5  and  T  according  to 
multiplicity.) 

The  choice  of  the  dimension  i  was  actually  made  for 
mr  at  level  lr  —  1.  The  choice  was  made  randomly  and 
uniformly  from  the  set  of  offsets  which  led  to  a  fault- 
free  path  to  level  lr .  The  exact  selection  of  offsets  i  are 
dependent  from  packet  to  packet  and,  for  a  particular 
packet,  from  one  level  to  the  next.  However,  no  matter 
how  we  condition  on  previous  events,  there  are  always 
enough  offsets  to  choose  from  at  any  given  moment. 
Also,  the  bounds  on  the  probabilities  of  congesting  po 
will  hold  regardless  of  previous  events.  Let  «i  <  ij  < 


<  *<r(r) ,  a(r)  >  ( Pn,  be  the  choices  of  offsets  at  level 


lr  —  1  which  lead  to  a  fault-free  path  to  level  lr .  Let  ar, 
°qual  the  number  of  offsets  j  such  that  if  mr  is  routed 
from  level  /r  —  1  to  level  /r  using  offset  i,  and  then  to  levdg^ 
/r  + 1  using  offset  j  then  congestion  results  in  m0  s  patll^Jr 
Then  since  ar$  =  Qr,  J2r,Qr>  =  0(n3).  Since  the 
total  number  of  offsets  j  is  n,  clearly  ar,  <  n.  Let  i,r  be 
the  offset  for  mr  actually  chosen  at  level  l,  -  1 .  Lemma 
3.7  implies  that  with  high  probability  £3r  Qr»r  =  0(n2). 

Set  0r  —  Qr,r-  At  level  lr,  whether  the  message  mr 
chooses  a  path  from  the  set  of  (pn  special  paths  or  the 
set  of  (1  —  C P)n  nonspecial  paths,  it  has  at  most  0r 
choices  which  congest  mo’s  path.  Thus  whether  we  con¬ 
dition  whether  the  choice  was  special  or  nonspecial,  the 
probability  that  message  m,  will  congest  mo’s  path  is 


bounded  by  ^ 


The  number  of  routing  attempts  made  by  mr  is 
gr  ~  G(Cp)-  On  each  attempt,  the  probability  that  mT 


will  congest  mo ’a  path  is  at  most  Each  attempt  is 


an  independent  trial  and  the  sum  of  the  probabilities  of 
congestion  in  the  trials  is  at  most  ^  53 S’-#-,  which  is 
0(n)  by  lemma  3.8.  By  lemma  2.7,  with  high  probabil¬ 
ity  O(n)  attempts  actually  did  congest  m0’s  path.  Since 
each  attempt  involves  at  most  six  edges,  each  attempt 
can  add  at  most  six  to  the  congestion  on  mo’s  path. 
Thus  with  high  probability,  the  total  congestion  on  the 
path  from  routing  attempts  at  even  levels  is  O(n). 

Next  examine  the  congestion  on  p0  from  other  pack¬ 
ets  beginning  and  ending  their  paths.  For  a  packet  t(^H| 
congest  an  edge  as  the  first  jump  edge  of  its  path,  it  has^^ 
to  be  generated  by  one  of  the  edge’s  endpoints.  Thus 
there  are  at  most  O(n)  such  packets.  Now  consider 


those  packets  congesting  po  during  the  ending  of  their 
paths.  Each  of  the  three  jump  edges  used  to  finish  off  a 
path  has  an  endpoint  which  is  at  distance  one  from  the 
virtual  destination.  Thus  at  most  O(n)  packets  exist 
which  have  the  potential  to  congest  any  given  edge  as 
the  first,  second  or  third  of  these  jump  edges.  Therefore 
a  total  of  0(n2)  packets  have  a  non-zero  probability  of 
congesting  some  edge  of  po  as  they  finish  their  routes. 
An  argument  along  the  lines  of  the  one  bounding  con¬ 
gestion  at  even  levels  shows  that  congestion  from  these 
sources  is  O(n)  as  well. 

The  same  argument  bounds  congestion  from  routing 
attempts  at  odd  levels,  and  also  bounds  congestion  on 
edges  incident  to  any  fixed  node.  | 

4  Fault  Tolerant  Routing 

The  ofTset  routing  algorithm  cannot  tolerate  faults 
which  occur  during  a  particular  routing  phase.  If  a 
packet  resides  in  a  node  as  it  fails,  that  packet  is  ir¬ 
retrievably  lost.  Rabin  ( [R])  discovered  how  to  use  the 
technique  of  information  dispersal  to  route  even  in  the 
presence  of  failing  nodes,  provided  each  fault  occurs 
with  probability  no  more  than  0(~ j). 

In  this  section  we  will  present  a  simpler  variation  of 
Rabin’s  algorithm.  We  also  show  how  our  algorithm 
handles  faults  occurring  with  probability  0(A).  First, 
we  will  briefly  sketch  the  main  ideas  of  the  original  rout¬ 
ing  algorithm.  Each  packet  is  dispersed  into  n  pieces 
sent  along  node-disjoint  paths  to  different  locations  and 
then  along  node-disjoint  paths  to  the  final  destination. 

Since  every  piece  needs  to  carry  fi(n)  bits  of  routing 
information,  the  original  packets  must  necessarily  be 
large.  For  concreteness  we  will  assume  that  all  packets 
contain  m  =  Q(n2)  bits.  Any  piece  created  will  contain 
O(^)  bits.  We  also  assume  that  all  links  and  nodes  have 
the  capacity  to  hold  a  constant  number  of  the  original 
packets  (and  therefore  ©(n)  pieces). 

Rabin  proves  that  with  high  probability,  the  number 
of  pieces  crossing  any  node  or  link  never  exceeds  its  ca¬ 
pacity.  This  guarantees  that  each  piece  can  move  during 
every  step  and  that  the  entire  routing  will  take  no  more 
than  2 (n  4-  1)  steps,  n  +  1  steps  for  each  piece  to  arrive 
at  its  random  intermediate  location  and  another  n  +  1 
to  arrive  at  its  fina'  destination.  No  piece’s  progress  is 
ever  delayed  by  a  full  queue  in  the  node  ahead. 

As  Rabin  points  out,  rout;u>  nth  dispersal  of  infor¬ 
mation  can  tolerate  faults  V  t.  dispersal  into  pieces  is 
done  with  more  redundanc  "i^e  pieces  may  actually 
be  constructed  in  such  a  <vay  that  the  arrival  of  half 
(oi  some  other  constant  fraction,  ihem  is  enough  to 
reconstruct  the  original  rr.MSagt.  Rabin  shows  how  to 
do  this  through  matrix  multiplication.  He  then  proves 
that  if  each  link  has  probability  dj  of  failure,  then  with 
probability  1  —  all  messages  will  be  safely  re¬ 

constructed  at  their  destinations. 


Our  improvement  of  Rabin’s  results  stems  from  a 
more  uniform  and  efficient  selection  of  paths  for  the 
routing  of  pieces.  The  n  pieces  are  first  sent  sent  to 
the  neighbors  of  the  node  which  generated  the  packet. 
These  pieces  are  then  routed  along  parallel  paths  to  the 
neighbors  of  a  random  intermediate  node.  From  there 
the  pieces  are  routed  along  parallel  paths  to  the  neigh¬ 
bors  of  the  intended  destination,  and  from  there  to  the 
destination  itself. 

If  v  and  w  are  two  hypercube  nodes,  let  jt,(v,  w)  be 
the  path  from  u*  to  w'  used  in  one  phase  of  the  Valiant- 
Brebner  scheme.  Let  II(t>,  u>)  =  u»)|l  <  i  <  n} 

be  the  set  of  all  possible  such  paths.  We  will  first  show 
that  if  each  node  v  chooses  a  node  v’  uniformly  and 
then  routes  a  different  piece  along  each  of  the  n  paths 
in  n(r,v')  that  only  O(n)  pieces  reside  in  any  node’s 
queue  at  any  time  step. 

Lemma  4.1.  Consider  the  collection  of  a II  paths  in  the 
N  sets  D(v,  v')  (varying  over  v),  where  each  hypercube 
node  v  has  chosen  a  node  v'  randomly  and  uniformly. 
For  any  node  u  and  any  integer  0  <  j  <  n,  with  high 
probability  u  is  the  jth  node  along  only  O(n)  paths  in 
the  collection. 

Proof.  If  u  is  the  j,h  node  along  the  path  it i(v,  ti>)  then 
u*  =  w\U>i . . .  WjVj+i  . . .  vn-  Separate  the  two  cases  in 
which  either  «  <  j  or  «  >  j.  If  t  <  j,  then  it  must  be 
that  vj+ 1 . . .  v„  =  tij+i . .  ,u„.  Precisely  V  nodes  satisfy 
this  condition  for  v.  If  one  of  these  nodes  chooses  a  xv 
such  that  ui\  . . .  tui-ittfitu.+i . . .  w,  =  m  . . .  Uj  for  some 
*  <  j,  then  u  will  be  the  j,h  node  along  exactly  one 
path  *v(v,  w).  Otherwise,  u  will  be  the  jtk  node  along 
none  of  the  paths  Xi(t;,u>),  i  <  j.  Thus  for  each  of  the 
2J  nodes,  the  probability  of  exactly  one  such  path  is  fa 
and  the  probability  of  no  such  paths  is  1  —  fa. 

If  i  >  ;,  then  v,+i  . .  .v,_iv,Vi+i  . .  . .  .u„ 

for  some  i  >  j.  Precisely  (n  —  j) 2;  nodes  satisfy  this 
condition.  All  reasoning  is  the  same  as  in  the  previous 
case,  except  now  w  must  be  chosen  so  that  u>i . .  .Wj  = 
u  j . . .  Uj .  Thus  the  probability  that  ti  is  the  jtk  node 
along  exactly  one  such  path  is  fa.  The  probability  that 
no  path  *i(v,  w),  i  >  j  crosses  u  in  this  fashion  is  1  —  fa. 

We  now  need  only  consider  the  sum  of  2J  0-1  random 
variables  each  with  probability  fa  of  equalling  I  and 
(n  -  j) 2’  0-1  random  variables  each  with  probability 
fa  of  equalling  I.  Call  this  sum  X.  Then  the  moment 
generating  function  M(A)  for  X  satisfies 


Thus  Pr[X  >  an]  <  e"(*x-i)e-«nA  _  A-i)n 

Setting  A  =  In  a,  this  implies  Pr[JV  >  an)  < 


(ea(1-*n a)~i )n t  a  bound  which  can  be  made  as  small 
as  desired  by  increasing  the  constant  a.  | 

The  ith  piece  created  from  v’s  packet  is  sent  to  t>‘, 
along  the  path  iti(v,w)  to  w‘  and  then  to  w.  By  lemma 
4.1,  at  no  time  do  more  than  an  pieces  cross  a  given 
hypercube  node,  with  high  probability.  Since  the  pack¬ 
ets  traversing  any  link  all  come  from  one  of  the  link’s 
endpoints,  no  more  than  2an  pieces  cross  the  link  dur¬ 
ing  any  step  of  the  routing.  If  all  links  and  nodes  have 
the  capacity  to  hold  2a  original  packets,  then  with  high 
probability  no  buffering  is  necessary  and  no  piece  waits 
in  a  queue. 

This  analysis  assumes  that  each  node  routes  its  packet 
to  a  random  destination.  If  we  use  two  phases  as  in  the 
Valiant-Brebner  scheme,  the  results  extend  to  arbitrary 
permutation  routings. 

Theorem  4.2.  If  all  packets  are  divided  into  n  pieces 
which  are  routed  along  parallel  paths  in  both  phases  of 
the  routing  algorithm,  then  for  an  arbitrary  permuta¬ 
tion,  with  high  probability  the  two-phase  routing  takes 
2(n  +  1)  steps.  No  piece  waits  at  any  time. 

If  we  encode  the  original  packet  in  the  pieces  via 
Rabin's  matrix  multiplication,  then  we  can  bound  the 
probability  that  v’s  packet  is  lost  by  the  probability  that 
some  2  of  its  pieces  run  into  faulty  components.  But  if 
that  many  pieces  are  lost,  then  at  least  2  are  lost  during 
one  of  the  two  phases  of  the  routing  algorithm.  Assume 
they  are  lost  in  the  first  phase;  the  reasoning  for  phase  2 
is  identical.  There  are  at  most  (2n  +  3)n  different  com¬ 
ponents  (nodes  or  links)  encountered  by  pieces  from  v 
during  the  first  phase.  We  need  the  following  bound 
on  the  number  of  intersections  between  the  routes  of 
different  pieces. 

Lemma  4.3.  For  any  hypercube  node  u  ^  v,w,  no 
more  than  two  paths  in  II(v,u>)  cross  u. 

Proof.  Count  the  nodes  along  the  path  tr,(v,  w) 
starting  with  v'  as  the  0<A  node.  Say  that  the  k,k 
node  along  *i(v,  w)  is  the  same  as  the  i!*  node  along 
t;(v,u>)  for  i  <  j.  Then  . .  .uj,  = 

tt/j  tv!; . . .  w3,  vj+1  . . .  v}, ,  where  uj  =  vt  iff  q  £  q'  and 
similarly  for  w ’  . 

There  are  four  cases.  If  k,l  <  j  then  vj  =  wj,  a 
contradiction.  Similarly,  if  k,l  >  i  then  =  w},  a 
contradiction.  If  k  <  i,  1  >  j  or  if  I  <  i,  k  >  j  then  it 
must  be  true  that  u>,  =  t7, ,  Wj  —  Vj  and  Wh  =  Vh  for  i  < 
h  <  j.  Thus  all  *h(v,w)  with  i  <  h  <  j  are  precluded 
from  crossing  u  (otherwise  u>a  =  tT*,  a  contradiction). 
Therefore  three  paths  cannot  all  cross  u.  | 

Since  no  component’s  failure  will  affect  more  than  two 
pieces,  it  must  be  true  that  at  least  |  of  the  (2n  +  3)n 
components  have  failed. 

Theorem  4.4.  Given  a  sufficiently  large  constant  c,  if 
each  component  of  the  hypercube  fails  independently 
with  probability  ~  before  or  during  some  permutation 


routing,  then  with  high  probability  the  routing  will  be 
successfully  completed.  That  is,  a  given  packet  will  a r4 
rive  at  its  destination  iff  both  its  origin  and  destination 
do  not  fail. 

Proof.  Whether  or  not  the  ith  component  fails  gives 
rise  to  a  0-1  random  variable  whose  moment  generating 
function  is  Mj(A)  =  (^ex  +  (1  —  j^)).  Thus  the  mo¬ 
ment  genenating  function  for  the  sum  of  these  random 
variables  is 


"<A>  *  (' + 


(2n+3)n 


Thus  we  can  bound  the  probability  that  more  than  |  of 

the  components  fail  by  exp( La-~.1JI2n+3)  -  An).  Setting 
A  =  In  ft ,  we  see  that  the  probability  of  so  many  failures 
is  no  more  than  (ei(^)i)n.  This  bound  can  be  made 
as  low  as  desired  by  increasing  the  constant  c.  | 

An  increase  in  the  probability  of  failure  by  a  con¬ 
stant  factor  can  be  tolerated  by  increasing  the  size  of 
the  pieces. 

Note  that  offset  routing  and  information  dispersal  are 
complementary  techniques.  By  combining  this  simpli¬ 
fied  variant  of  information  dispersal  with  offset  routing, 
still  better  results  are  possible,  at  least  in  the  theoretic 
setting.  The  combined  routing  algorithm  tolerates  the 
failure  of  a  constant  fraction  of  the  hypercube’s  compo-1 
nents  during  the  course  of  the  routing  of  a  single  permu¬ 
tation.  To  send  a  packet,  the  node  first  disperses  pieces 
to  a  well  defined  set  of  n  nodes  at  distance  3  (instead  of 
neighbors).  The  packets  are  then  routed  along  parallel 
offset  paths  to  the  symmetric  set  of  n  nodes  close  to 
the  destination.  Finally,  the  pieces  are  combined  at  the 
destination.  If  each  node  or  link  fails  independently  of 
other  components  and  if  in  the  case  it  fails  it  does  so  at 
a  random  time  during  the  routing  then  this  combined 
algorithm  tolerates  failure  rates  of  a  constant  fraction 
of  the  hypercube’s  components. 


5  Open  Questions 

The  hypercube  is  the  first  network  with  small  node  de¬ 
gree  which  is  known  to  be  reconfigurable  (with  high 
probability)  with  only  constant  slowdown  when  a  con¬ 
stant  fraction  of  its  nodes  and  edges  fail.  It  remains 
open  whether  any  constant  degree  network  shares  this 
property.  In  particular,  it  would  be  of  interest  to  deter¬ 
mine  if  similar  results  hold  for  the  butterfly. 
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